module allows you to load a toolman or options --help, -h (RTFM!)srun, sbatch (IFB documentation)M5 (i.e Module5) and move inYou must see this when you use the tree command:
.
├── CLEANING
├── FASTQ
├── MAPPING
└── QC
mkdir -p ~/M5/FASTQ # -p: no error if existing, make parent directories as needed
mkdir -p ~/M5/CLEANING
mkdir -p ~/M5/MAPPING
mkdir -p ~/M5/QC
cd ~/M5
tree ~/M5 # list contents of directories in a tree-like format.
wget, fasterq-dump or sra-toolsgzipYou have to have these files in the FASTQ directory
ls -ltrh ~/M5/FASTQ/
total 236M
-rw-rw-r-- 1 orue orue 127M 6 mars 12:32 SRR8082143_2.fastq.gz
-rw-rw-r-- 1 orue orue 109M 6 mars 12:32 SRR8082143_1.fastq.gz
Get the data by the method of your choice: - use wget or fasterq-dump from sra-tools
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR808/003/SRR8082143/SRR8082143_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR808/003/SRR8082143/SRR8082143_2.fastq.gz
module load sra-tools
srun fasterq-dump -S -p SRR8082143 --outdir . --threads 1
Then compress the files:
gzip *.fastq
QC directory (use 8 threads)You have to obtain these files:
ls -ltrh ~/M5/QC
total 1,9M
-rw-rw-r-- 1 orue orue 321K 6 mars 13:23 SRR8082143_1_fastqc.zip
-rw-rw-r-- 1 orue orue 642K 6 mars 13:23 SRR8082143_1_fastqc.html
-rw-rw-r-- 1 orue orue 333K 6 mars 13:23 SRR8082143_2_fastqc.zip
-rw-rw-r-- 1 orue orue 642K 6 mars 13:23 SRR8082143_2_fastqc.html
cd ~
module load fastqc
srun --cpus-per-task 8 fastqc FASTQ/SRR8082143_1.fastq.gz -o QC/ -t 8
srun --cpus-per-task 8 fastqc FASTQ/SRR8082143_2.fastq.gz -o QC/ -t 8
module load fastp
cd ~/M5
srun --cpus-per-task 8 fastp --in1 FASTQ/SRR8082143_1.fastq.gz --in2 FASTQ/SRR8082143_2.fastq.gz -l 100 --out1 CLEANING/SRR8082143_1.cleaned_filtered.fastq.gz --out2 CLEANING/SRR8082143_2.cleaned_filtered.fastq.gz --unpaired1 CLEANING/SRR8082143_singles.fastq.gz --unpaired2 CLEANING/SRR8082143_singles.fastq.gz -w 1 -j CLEANING/fastp.json -h CLEANING/fastp.html -t 8
ls -ltrh ~/M5/CLEANING/
total 245M
-rw-rw-r-- 1 orue orue 113M 6 mars 12:59 SRR8082143_1.cleaned_filtered.fastq.gz
-rw-rw-r-- 1 orue orue 162K 6 mars 12:59 fastp.json
-rw-rw-r-- 1 orue orue 525K 6 mars 12:59 fastp.html
-rw-rw-r-- 1 orue orue 2,2M 6 mars 12:59 SRR8082143_singles.fastq.gz
-rw-rw-r-- 1 orue orue 130M 6 mars 12:59 SRR8082143_2.cleaned_filtered.fastq.gz
ls -ltrh ~/M5/CLEANING/
total 248M
-rw-rw-r-- 1 orue orue 113M 6 mars 12:59 SRR8082143_1.cleaned_filtered.fastq.gz
-rw-rw-r-- 1 orue orue 162K 6 mars 12:59 fastp.json
-rw-rw-r-- 1 orue orue 525K 6 mars 12:59 fastp.html
-rw-rw-r-- 1 orue orue 2,2M 6 mars 12:59 SRR8082143_singles.fastq.gz
-rw-rw-r-- 1 orue orue 130M 6 mars 12:59 SRR8082143_2.cleaned_filtered.fastq.gz
-rw-rw-r-- 1 orue orue 1,2M 6 mars 13:28 multiqc_report.html
drwxrwxr-x 2 orue orue 2,0M 6 mars 13:28 multiqc_data
cd ~/M5
module load multiqc
multiqc -d . -o CLEANING
/shared/projects/dubii2020/data/module5/seance1/CP031214.1.fasta with bwals -ltrh ~/M5/MAPPING/
total 249M
-rw-rw-r-- 1 orue orue 249M 6 mars 13:01 SRR8082143.bam
cd ~/M5
module load bwa
## srun bwa index sequence.fasta
srun --cpus-per-task=33 bwa mem /shared/projects/dubii2020/data/module5/seance1/CP031214.1.fasta CLEANING/SRR8082143_1.cleaned_filtered.fastq.gz CLEANING/SRR8082143_2.cleaned_filtered.fastq.gz -t 32 | samtools view -hbS - > MAPPING/SRR8082143.bam
1. Andrews S. FastQC a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
A work by Migale Bioinformatics Facility
https://migale.inrae.fr
Our two affiliations to cite us:
Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France
Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, 78350, Jouy-en-Josas, France